231 research outputs found
A random forest system combination approach for error detection in digital dictionaries
When digitizing a print bilingual dictionary, whether via optical character
recognition or manual entry, it is inevitable that errors are introduced into
the electronic version that is created. We investigate automating the process
of detecting errors in an XML representation of a digitized print dictionary
using a hybrid approach that combines rule-based, feature-based, and language
model-based methods. We investigate combining methods and show that using
random forests is a promising approach. We find that in isolation, unsupervised
methods rival the performance of supervised methods. Random forests typically
require training data so we investigate how we can apply random forests to
combine individual base methods that are themselves unsupervised without
requiring large amounts of training data. Experiments reveal empirically that a
relatively small amount of data is sufficient and can potentially be further
reduced through specific selection criteria.Comment: 9 pages, 7 figures, 10 tables; appeared in Proceedings of the
Workshop on Innovative Hybrid Approaches to the Processing of Textual Data,
April 201
IOD-CNN: Integrating Object Detection Networks for Event Recognition
Many previous methods have showed the importance of considering semantically
relevant objects for performing event recognition, yet none of the methods have
exploited the power of deep convolutional neural networks to directly integrate
relevant object information into a unified network. We present a novel unified
deep CNN architecture which integrates architecturally different, yet
semantically-related object detection networks to enhance the performance of
the event recognition task. Our architecture allows the sharing of the
convolutional layers and a fully connected layer which effectively integrates
event recognition, rigid object detection and non-rigid object detection.Comment: submitted to IEEE International Conference on Image Processing 201
Player Re-Identification Using Body Part Appearences
We propose a neural network architecture that learns body part appearances
for soccer player re-identification. Our model consists of a two-stream network
(one stream for appearance map extraction and the other for body part map
extraction) and a bilinear-pooling layer that generates and spatially pools the
body part map. Each local feature of the body part map is obtained by a
bilinear mapping of the corresponding local appearance and body part
descriptors. Our novel representation yields a robust image-matching feature
map, which results from combining the local similarities of the relevant body
parts with the weighted appearance similarity. Our model does not require any
part annotation on the SoccerNet-V3 re-identification dataset to train the
network. Instead, we use a sub-network of an existing pose estimation network
(OpenPose) to initialize the part substream and then train the entire network
to minimize the triplet loss. The appearance stream is pre-trained on the
ImageNet dataset, and the part stream is trained from scratch for the
SoccerNet-V3 dataset. We demonstrate the validity of our model by showing that
it outperforms state-of-the-art models such as OsNet and InceptionNet
Context-Aware Chart Element Detection
As a prerequisite of chart data extraction, the accurate detection of chart
basic elements is essential and mandatory. In contrast to object detection in
the general image domain, chart element detection relies heavily on context
information as charts are highly structured data visualization formats. To
address this, we propose a novel method CACHED, which stands for Context-Aware
Chart Element Detection, by integrating a local-global context fusion module
consisting of visual context enhancement and positional context encoding with
the Cascade R-CNN framework. To improve the generalization of our method for
broader applicability, we refine the existing chart element categorization and
standardized 18 classes for chart basic elements, excluding plot elements. Our
CACHED method, with the updated category of chart elements, achieves
state-of-the-art performance in our experiments, underscoring the importance of
context in chart element detection. Extending our method to the bar plot
detection task, we obtain the best result on the PMC test dataset.Comment: Published in ICDAR 2023. Code and model are available at
https://github.com/pengyu965/ChartDet
Learning Higher-order Transition Models in Medium-scale Camera Networks
We present a Bayesian framework for learning higherorder transition models in video surveillance networks. Such higher-order models describe object movement between cameras in the network and have a greater predictive power for multi-camera tracking than camera adjacency alone. These models also provide inherent resilience to camera failure, filling in gaps left by single or even multiple non-adjacent camera failures. Our approach to estimating higher-order transition models relies on the accurate assignment of camera observations to the underlying trajectories of objects moving through the network. We addresses this data association problem by gathering the observations and evaluating alternative partitions of the observation set into individual object trajectories. Searching the complete partition space is intractable, so an incremental approach is taken, iteratively adding observations and pruning unlikely partitions. Partition likelihood is determined by the evaluation of a probabilistic graphical model. When the algorithm has considered all observations, the most likely (MAP) partition is taken as the true object trajectories. From these recovered trajectories, the higher-order statistics we seek can be derived and employed for tracking. The partitioning algorithm we present is parallel in nature and can be readily extended to distributed computation in medium-scale smart camera networks. 1
A Survey and Approach to Chart Classification
Charts represent an essential source of visual information in documents and
facilitate a deep understanding and interpretation of information typically
conveyed numerically. In the scientific literature, there are many charts, each
with its stylistic differences. Recently the document understanding community
has begun to address the problem of automatic chart understanding, which begins
with chart classification. In this paper, we present a survey of the current
state-of-the-art techniques for chart classification and discuss the available
datasets and their supported chart types. We broadly classify these
contributions as traditional approaches based on ML, CNN, and Transformers.
Furthermore, we carry out an extensive comparative performance analysis of
CNN-based and transformer-based approaches on the recently published CHARTINFO
UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The
data set includes 15 different chart categories, including 22,923 training
images and 13,260 test images. We have implemented a vision-based transformer
model that produces state-of-the-art results in chart classification.Comment: Accepted in 15th IAPR Workshop on Graphics Recognition (GREC) 2023 in
conjunction with 17th International Conference on Document Analysis and
Recognition (ICDAR) 2023, August 21-26, 2023 San Jose, US
- …